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McRunjob is a powerful grid workflow manager used to manage the generation of large numbers of production 
processing jobs in High Energy Physics. In use at both the DZero and CMS experiments, McRunjob has been 
used to manage large Monte Carlo production processing since 1999 and is being extended to uses in regular 
production processing for analysis and reconstruction. Described at CHEP 2001, McRunjob converts core 
metadata into jobs submittable in a variety of environments. The powerful core metadata description language 
includes methods for converting the metadata into persistent forms, job descriptions, multi-step workflows, 
and data provenance information. The language features allow for structure in the metadata by including full 
expressions, namespaces, functional dependencies, site specific parameters in a grid environment, and ontological 
definitions. It also has simple control structures for parallclization of large jobs. McRunjob features a modular 
design which allows for easy expansion to new job description languages or new application level tasks. 
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1. Introduction 

McRunjob (Monte Carlo Run Job) was first cre- 
ated in the context of the DZero Experiment at Fer- 
milab during the 1999 DZero Monte Carlo Challenge. 
At the time, there was no easy generic way to orga- 
nize large batches of Monte Carlo jobs, each possibly 
involving multiple processing steps. McRunjob was 
originally designed so as to be generic enough so that 
the addition of new production processing executa- 
bles would not pose a significant integration problem 
into the existing framework and so that different exe- 
cutables could be linked together in possibly complex 
tree-like workflows in which each node represents a 
processing step. The main focus of McRunjob pro- 
vides a metadata based abstraction of each job step 
and to provide tools that allow for specification of the 
metadata, functional dependencies of the metadata 
among distinct steps, delegation of methods to build 
and or run jobs, and linkages to external frameworks, 
databases, or servers. While McRunjob has been used 
continuously at DZero since then, it has only been in 
use at CMS since the end of 2002 for regular produc- 
tion operations. 

Typically, McRunjob operates during the job build- 
ing stage to turn structured metadata into jobs. It 
does this by establishing interfaces to do the follow- 
ing: 

• Define and access a unit of schema called a Con- 
figurator 

• Register functions to the schema to perform job 
building, or 

• Optional delegation of job building responsibil- 
ities to other Configurators 

• Support User driven framework operation 



Support linkages to external databases, cata- 
logs, or resource brokers. 

Register parsers to the schema to allow for cus- 
tomized access to the Configurator interface as 
text macros 

Specify dependencies among the metadata ele- 
ments 

Support rudimentary ontologies through speci- 
fication of synonyms and versioning 

Support inter-Configurator communication and 
User Interface through a Configurator container 
object known as the Linker. 



2. Architecture of McRunjob 

McRunjob is implemented in Python and consists 
of three major components: 

• The Configurator Configurators are essentially 
packages of metadata that describe applications. 
Configurators can be defined to describe appli- 
cation input, environment, and output. How- 
ever, since the Configurators are completely 
generic, they can also describe batch queues, 
grid execution environments, information from 
a database, local computing site information, 
etc. Taken together, the Configurators describe 
workflow and provenance of data. 

• The Script Generator The Script Generator is a 
specialization of a Configurator that also imple- 
ments the ScriptGen interface. The ScriptGen 
interface makes it possible for Configurators to 
delegate specific job generating tasks to a single 
common ScriptGen object. This helps keep job 
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generation consistent in an environment where 
there may be different schemes for creating or 
handling jobs. 

• The Linker The Linker is a container for Con- 
figurators. It also acts as a communication bus 
for Configurators, a driver for the job building 
framework, and a user interface to the Linker 
and Configurator APIs. 

Figure ^ shows the simplest McRunjob scenario. A 
User or Production Coordinator needs to run three 
applications: let's call them A, B, and C. Let's say fur- 
ther that the output of A is the input of B and that the 
output of B is the input to C. The user will communi- 
cate to the Linker directives to instantiate pre-defined 
Configurators corresponding to A, B, and C. 1 Usually 
such job building directives are kept in an McRun- 
job macro script, the syntax of which is described 
below. The user issues a set of configuration macro 
commands which are routed to the relevant Config- 
urators. These configuration commands may include 
specification of values for the schema, specification of 
inter-Configurator dependencies, and the specification 
of functional dependencies among schema elements in 
different Configurators. Since each Configurator is re- 
quired to have a unique description within the Linker 
space, so the Configurators themselves function much 
like namespaces. An example of a simple functional 
dependency is BdnputFile = A:OutputFile. 2 The 
"MakeJob" and "MakeScript" directives, examples of 
framework calls, are issues. These particular frame- 
work call cause the Configurators to generate shell 
scripts to handle their respective applications in se- 
rial order. The scripts are then collected by the Linker 
and a composite shell script that represents the entire 
workflow is produced. This procedure can be reset 
and re-run as many times as desired to kick out as 
many jobs as desired. The procedure is also generic 
in that different targets than shell scripts (eg- directed 
graphs) can be selected by including different Script- 
Gen modules. 

In addition to modeling the application space, 
the Configurators also provide a useful abstraction 
through which to exchange information with other ex- 
ternal sources such as databases, batch queues, etc. 
Figure [21 shows a generalized picture of how Configu- 
rators may do this. Typically, the user writes a script 
of McRunjob macro commands which are interpreted 
by the Linker framework (shown in light blue.) The 



1 This is the most common case: that the Configurators cor- 
responding to production applications are written beforehand 
by experts. However, directives also exist for the creation of 
Configurators and specification of schema "on the fly." 

2 Such obvious I/O dependencies have a special place in many 
job handling systems, but McRunjob treats all possible meta- 
data dependencies on an equal footing. 



Linker takes these commands and distributes them 
to the Configurators attached below. The Configura- 
tor layer exposes to the Linker sets of metadata key- 
value pairs, but with additional customizable back- 
ends. For example, one class of Configurators ("Input- 
Plugins") have backends that communicate to exter- 
nal databases, planners, or servers. More conventional 
Configurators just hold on the application metadata. 
ScriptGenerators collect results from previous Con- 
figurators and produce composite workflows (as de- 
scribed above). Finally, a Batch Portal Configurator 
may take the produced composite script object and 
submit it to a batch queue. 

In the DZcro context, Monte Carlo production 
is coordinated with the SAM database at FNAL. 
Two of the common applications in the workflow are 
PYTHIA Generation and DOgstar (GEANT Simula- 
tion.) The ScriptGenerator targets executable scripts 
for the DZero executable script environment 3 , and one 
possible execution environment is the SAM/ JIM grid 
service. In upcoming DZero production on the grid, 
there may no jobs; rather the focus is on automatic 
production of McRunjob macros which replace scripts 
and are executable by remote Linkers. Also, there is 
work being done to leverage existing McRunjob tools 
to do monitoring on the DZero farms. 

Some typical dependency relationships among con- 
figurators include modeling of the sequence in which 
applications have to run on a set of events in order 
to reach a given data product or modeling of param- 
eter flows in environments where several databases or 
configuration files may be consulted in the process of 
job creation. One feature of the McRunjob frame- 
work in CMS that is disabled in the DZero framework 
is the requirement that such dependency relationships 
be clearly defined before inter-Configurator parameter 
lookup can take place. This discipline is useful, how- 
ever, in an environment where a clear provenance of 
the produced data is not already established by cen- 
tral means. At DZero, this is largely handled by the 
SAM database. 

Three final points can be made. The first is that 
although McRunjob was conceived in a Monte Carlo 
production environment, it is perfectly and immedi- 
ately well suited to any problem involving complex 
workflow specification and job templating in a produc- 
tion processing environment. The second is that while 
McRunjob was designed to describe production work- 
flows in the Monte Carlo setting (ie- applications and 
files) there is no reason that it cannot be extended into 
more fine grained settings to describe Analysis Object 
Data (AOD) and their relationships and provenances. 
Finally, McRunjob typically operates after metadata 



3 There is only one ScriptGen at DZero so usually no distinc- 
tion is made. 



TUCT007 



Computing in High Energy Physics, March 24~28 2003, La Jolla, CA 



3 



(^Sc riptO eT^)- 



Linker 



Script or 
Code to 
run apps 
AB. and C 



I want to run 
apps A, B, and C 



X 

(^C^onf ig u ratoT^^) — i><^^orrfig u r atoT^)— t<^^onf ig u rator^P^> 



< 




S cript or 
Code to 
run app A 



i 




Script or 
Code to 
run app B 



Script or 
Code to 
run app C 



Figure 1: A simple McRunjob scenario. The User or Production coordinator needs to run three applications A, B, and 
C. The user communicates with the Linker to attach the appropriate configurators, set their metadata values, and run 
the Linker framework to cause the Configurators to produce jobs. 



is specified and before jobs are actually submitted; 
McRunjob could conceivably be extended into run- 
time to bring parameter lookup services into runtime. 

2.1. The Configurator 

The Configurator API provides methods for au- 
tomating many of the procedures inherent in speci- 
fying workflow for Monte Carlo Production or Anal- 
ysis. The Configurator is essentially a value added 
metadata container. It comprises a special Trigger- 
Dictionary class used to hold the metadata key/ value 
pairs and the methods provided to manipulate the 
metadata in a production processing environment. 

• The TriggerDictionary allows the user to pro- 
vide an implementation for the internal dictio- 
nary. The implementation must use the regular 
Python UserDict interface. 

• The TriggerDictionary makes calls to user sup- 
plied functions on reads or writes to the internal 
dictionary implementation. 

The TriggerDictionary triggering mechanism is used 
to implement several Configurator functionalities, 



such as parameter lookup or construction. It is also 
used to implement parameter monitor and watch func- 
tions for debugging purposes. The internal implemen- 
tation object is swappable, enabling GUI linkage on 
demand. There are four kinds of triggers: (1) Global 
Read: Functions that are called when any element 
is read. (2) Global Write: Functions that are called 
when any element is written to. (3) Indexed Read: 
Functions that are called only when a specific element 
is read. (4) Indexed Write: Functions that are called 
only when a specific element is written to. Functions 
that handle any of the triggers must be registered to 
the TriggerDictionary object as described below, and 
must accept a Python list as argument. In all cases, 
the first element in the list is always a back reference to 
the TriggerDictionary object, and the second is always 
the key that was called. The remaining elements, is 
present, arc defined at registration time. NOTE: Trig- 
ger handlers registered to TriggerDictionary, if they 
are going to alter dictionary state, must always inter- 
act with the TriggerDictionary using the Untriggere- 
dRead and UntriggeredWrite methods; otherwise an 
infinite loop could occur. 
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Figure 2: At the direction of the user through a macro script file, one class of Configurators ("InputPlugins") have 
backends that communicate to external databases, planners, or servers. More conventional Configurators just hold on 
the application metadata. ScriptGenerators collect results from previous Configurators and produce composite 
workflows as described above. Finally, a Batch Portal Configurator may take the produced composite script object and 
submit it to a batch queue. 



The feature that the TriggcrDictionary can accept 
any conformant implementation of its internal dictio- 
nary structure implies that structures can be built 
for this purpose that have external linkage to graph- 
ics or GUI packages. Furthermore, these can be 
"HotSwapped" so that graphics packages or debug- 
ging mechanisms can be inserted into running McRun- 
job programs. 

Configurators are themselves described by meta- 
data. This metadata is used internally by McRun- 
job to resolve dependencies, keep track of schema ver- 
sions, resolve entries in synonym tables, and distin- 
guish Configurators within the memory space of a 
Linker. Configurators can function within the Linker 
as namespaces; the ConfiguratorDescription objects 
allow the namespaces to be referenced. 

The ConfiguratorDescriptions are generally used 
internally for two things: to implement inter- 
Configurator dependencies and to aid in parameter 
lookup. In the first capacity, a Configurator can de- 
clare dependencies on other Configurators. This can 
occur statically when a developer is modeling under- 
lying relationships among applications or dynamically 
when a user is modeling relationships among servers, 
planners or databases. When adding a Configurator 



to the Linker or when altering the dependencies of 
Configurators already in the Linker, the dependencies 
are checked and an Exception is thrown if not satis- 
fied. The mechanism is that the dependencies of the 
new or changed Configurator are matched against the 
list of existing Configurators in the Linker If there is 
not a match, then an exception is thrown. NOTE: 
This behavior is disabled in DZero. 

The Configurators support a parameter lookup ser- 
vice based on namespaces within the Linker and on 
declared dependencies. Since the ConfiguratorDe- 
scriptions of Configurators must be unique within the 
Linker, they define a partition (namespaces) on the 
parameters. Thus a parameter in the Linker is de- 
fined by a complete specification of the Configura- 
torDescription and the parameter name. From a Con- 
figurator point of view, a parameter in Configurator 
B is only visible if there exists a declared dependency 
on Configurator B. This last behavior is also disabled 
in DZero. 

Configurators contain synonym tables. These are 
lookup tables that translate local metadata key names 
into different key names in other Configurator types. 
The behavior of a workflow can therefore change de- 
pending upon what synonyms are loaded at any given 
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time. The synonyms tables can be loaded for different 
environments or changing versions, thus providing for 
a rudimentary ontology. 

Finally, Configurators can have explicit metadata 
translation or construction rules attached directly to 
each metadata element. These are available to the 
developer, but not yet available in the macro script 
language. 

Examples of Configurators include those that have 
connectivity with external databases (ie- RefDB in 
CMS through SQL queries or SAM in DZero through 
system commands,) those which model applications 
steps (ie- Monte Carlo generation, detector simula- 
tion, digitization,) those which submit jobs to spec- 
ified batch portals (ie- LSF or PBS batch systems, 
Condor, DAGMan/Condor-G.) 

2.2. The Script Generator 

One of the problems encountered in practice using 
the above model of Configurators generating custom 
bits of code which are then collected by the Linker 
for submission to an execution manager is that there 
is no organization in place to help guarantee that all 
of the independently generated bits of code will be 
compatible. For example, they may be targeted for 
an environment in which the code bits cooperate at 
runtime through non-McRunjob interfaces. The only 
way to organize this is at the level of the Configurator 
itself; so the number of modules potentially needing 
modification in case of a change to the runtime envi- 
ronment is as large as the number of Configurators. 

ScriptGen is a special interface implemented by 
some Configurators that enable Configurators to del- 
egate specific calls to a single Configurator. In the 
case of delegation, the ScriptGen must declare Con- 
figuratorDescriptions and method calls which it can 
handle. The Configurator must specify which method 
calls it will delegate and the description of the Script- 
Gen module to which it is delegating. With this func- 
tionality, a new way to organize code is available: all of 
the script generating code targeting specific runtime 
environments can be collected in a single ScriptGen 
module. The ScriptGen module is also usually the 
agent which the Linker uses to collect code bits tar- 
geted for a specific environment in order to create a 
composite job or DAG. 

Examples of different ScriptGen modules in CMS 
are the default ImpalaScriptGcn module, which gener- 
ates executable scripts compatible with the legacy Im- 
pala production environment, the ImpalaLiteScript- 
Gen module, the CMSProdScriptGen module, the 
VDLScriptGen module for generating specifications 
written in the Chimera Virtual Data Language, and 
the MOPDagGcn module for taking the output of 
other specified ScriptGen modules and producing a 
Directed Acyclic Graph (DAG) for use by the Condor 



DAGMan tool. 

2.3. The Linker 

The Linker is a Container class for Configurators. 
It handles all communication between the User and 
the Configurators and between any two Configura- 
tors. It also contains a repository for "script Objects". 
Configurators that need to generate code bits to im- 
plement a given workflow or job can store these bits 
in the Linker as script Objects. As described above, 
a ScriptGen module may later collect script Objects 
targeted for a specific environment and create a com- 
posite script Object. It may also, as in the case of 
MOPDagGen, wrap existing scriptObjects or compos- 
ites into a DAG. 

The Linker also supports some simple looping struc- 
tures within the McRunjob macro scripts, and also 
drives the framework, described in the next section. 



3. The Framework 

The Configurators build jobs together by contribut- 
ing their specialized knowledge of application steps or 
external resources to the overall whole in structured 
ways. One part of this structure is the Configurator 
dependencies 4 . Another structure which organizes the 
order in which tasks are completed is the Framework. 
The framework is basically a sequence of strings used 
as messages sent to framework handlers in the Config- 
urators. The messages can include things like Reset, 
MakeJob or MakeScript for shell script building, list- 
ing of derivations and transformations in CHIMERA, 
etc. 

Traditionally in McRunjob, framework calls are 
handled directly by the Configurators themselves 
through subclassing the Framework handling meth- 
ods. However, to better support flexibility without 
using inheritance, the Configurator base class also pro- 
vides methods for registering functions (possibly user 
supplied in certain simple cases) to handle specific 
framework messages. As described above, as a double 
indirection supporting code maintenance tasks, these 
functions can also be registered to a special Configu- 
rator that inherits the ScriptGen interface and then 
delegated. 

The Linker thus provides the drumbeat according to 
which the Configurators march: it provides a context 
within which to order the Configurators by their de- 
pendencies and a framework within which to sequence 
method invocations. 



4 Or, when not enabled, just the order in which Configurators 
arc added. 
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4. The Macro Language 

The McRunjob macros are intended to provide a 
user interface to the Configurator and Linker APIs. 
It is possible to construct the macros as a complete 
declarative specification of the workflow, but even in 
a procedural environment where parameters are being 
"constructed" or "discovered" in external databases 
the resulting state of the McRunjob program can at 
any time be dumped in declarative format. Thus 
macros can also serve as a rudimentary "provenance" 
for the described or constructed workflow. 

The Linker macros comprise commands that at- 
tach Configurators, route macro commands to spec- 
ified Configurators, and simple looping and condi- 
tional constructs. In the Configurator, the handling of 
macros is done in a "class distributed" fashion. Con- 
figurator classes can have macro handlers registered 
to them so that it is very easy to extend their macro 
interfaces. A particular Configurator object passes 
a particular macro to each of the registered macro 
handlers until it finds one that can handle the par- 
ticular macro. The Configurator base class registers 
a base parser which is called last, and Configurator 
subclasses extend this. Following is a list of simple 
Linker directives: 

• attach cfgldentifier attaches a configurator of 
the given type. 

• cfg cfgldentifier cmd issues the macro "cmd" 
to the specified Configurator. 

• framework run cmd issues the framework 
message "cmd" to all Configurators in sequence. 
Framework commands can be grouped together 
and run in groups as well. 

Following is a list of simple Configurator macros: 

• additem keyname Adds a metadata element 
named "keyname" 

• define keyname expression Sets the value 
of "keyname" to "expression" where expression 
can be a literal or a reference to the value of a 
key in another configurator or a reference into 
the internal Configurator synonym table 5 or a 
directive to construct the value by registered 
function. 

• addreq cfgldentifier Adds cfgldentifier as a 
dynamic dependency for this configurator. 

• synonym key ::cfgldentifier:newkey Defines 
a possible synonym for "key" to target "newkey" 
in another Configurator. 



• oncall fmk do cmd Store command "cmd" and 
execute it on receipt of framework call "fmk" . 

Macros can source other macros. In this way, 
McRunjob macro commands can be separated into 
synonym definitions and stored commands on one 
hand and pure workflow descriptions on the other 
hand. The former are seen as part of the environ- 
ment and arc in some sense independent of the pure 
workflow descriptions. The management of these en- 
vironments leads to a rudimentary ontological man- 
agement system. 

4.1. The "Hello World" Example 

In the CMS implementation of McRunjob, a Hel- 
lo World example is provided which consists of a Hel- 
loWorld Configurator with metadata element Hel- 
loMessage and a Hello WorldScript Gen that also serves 
as a metadata server. Each Hello World configurator is 
equipped to produce a short script which echos its Hel- 
loMessage to the screen. The Hello WorldScriptGen 
collects these scripts into a composite. The following 
is a simple example macro script fragment that would 
print out a HclloWorld message in English, French, 
and German 6 . 

# Attach the ScriptGen which will in this 

# case also serve metadata values to the 

# HelloWorld configurators 
attach HelloWorldScriptGen 

cfg HelloWorldScriptGen additem English 
cfg HelloWorldScriptGen define English \ 

Hello World 
cfg HelloWorldScriptGen additem French 
cfg HelloWorldScriptGen define French \ 

Salut le Monde 
cfg HelloWorldScriptGen additem German 
cfg HelloWorldScriptGen define German \ 

Hallo Welt 

# Attach the HelloWorld Configurators 

# themselves 

attach HelloWorld named English 
attach HelloWorld named French 
attach HelloWorld named German 

# Enable HelloWorld to delegate script 

# generation to ScriptGen. (This also 

# sets correct dependencies.) 

cfg HelloWorldScriptGen register HelloWorld 

# Route the metadata to correct 

# configurators 



5 'Real expressions like "a+b/c" are not yet supported. 



6 This uses new syntax instituted as of May 2003. 
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Table I Framework Operation in the Hello World example. The sequence goes from left to right and then up to down, 
like reading a book (in English.) 



cfg HelloWorld named English define \ 
HelloMessage : :HelloWorldScriptGen: English 

cfg HelloWorld named French define \ 
HelloMessage : :HelloWorldScriptGen: French 

cfg HelloWorld named German define \ 
HelloMessage : :HelloWorldScriptGen: German 

# Fork the resulting jobs in background 

# Set it to get executables list every time 

# 1 'RunJob'' is executed, 
attach Fork 

cfg Fork define ScriptGenName \ 

HelloWorldScriptGen 
cfg Fork oncall RunJob do \ 

define ExecutableList :: construct 

Upon invocation of the framework, this will result 
in the sequence of framework calls shows in table 0] 
and will result in the output 

Hello World 
Salut le Monde 
Hallo Welt 



5. Conclusions and Future Plans 

McRunjob has been successfully used in both the 
DZero and CMS experiments to model HEP workflows 
for Monte Carlo productions both on local controlled 
farms resources and in Grid environments. In both 
experiments, there is a desire to see how far we can ex- 
tend McRunjob into the realm of interactive analysis; 
The extension to batch analysis should be straightfor- 
ward. More immediately, full expression support will 
be added to the macro language. A common project 
at Fermilab between USCMS and DZero is also being 
started to address common goals and support issues. 

There are many exciting directions being explored. 
In the context of DZero, runtime McRunjob is be- 
ing explored as an answer to the need for monitoring 
jobs on the farms. The declarative specifications of 
jobs are converted to XML and stored in a local XML 
database, and the McRunjob created job is instru- 
mented to update this database. Furthermore, the 



extension of the rudimentary ontologies as described 
above presents an interesting research problem as the 
environments (as defined above) become large. Also, 
how the workflow description plus and annotations 
from the environment informs the provenance of a 
particular derived data product is an open question. 
Finally, as the Grid itself adopts a more Web Services 
oriented model of operation, it may become impor- 
tant to include extensions to proposed standards such 
as Web Services Flow Language (WSFL). 
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